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Abstract 



The generation interval is the time between the infcetion tune of an infected 
person and the infection time of his or her infector. Probabihty density func- 
tions for generation intervals have been an important input for epidemic models 
and epidemic data analysis. In this paper, we specify a general stochastic SIR 
epidemic model and prove that the mean generation interval decreases when 
susceptible persons are at risk of infectious contact from multiple sources. The 
intuition behind this is that when a susceptible person has multiple potential 
infectors, there is a "race" to infect him or her in which only the first infectious 
contact leads to infection. In an epidemic, the mean generation interval con- 
tracts as the prevalence of infection increases. We call this global competition 
among potential infectors. When there is rapid transmission within clusters of 
contacts, generation interval contraction can be caused by a high local preva- 
lence of infection even when the global prevalence is low. We call this local 
competition among potential infectors. Using simulations, we illustrate both 
types of competition. Finally, we show that hazards of infectious contact can be 
used instead of generation intervals to estimate the time course of the effective 
reproductive number in an epidemic. This approach leads naturally to partial 
likelihoods for epidemic data that are very similar to those that arise in survival 
analysis, opening a promising avenue of methodological research in infectious 
disease epidemiology. 



1 Introduction 



In infectious disease epidemiology, the serial interval is the difference between 
the symptom onset time of an infected person and the symptom onset time of his 
or her infector [1]. This is sometimes caUed the "generation interval." However, 
we find it more useful to adopt the terminology of Svensson [2] and define the 
generation interval as the difference between the infection time of an infected 
person and the infection time of his or her infector. By these definitions, the 
serial interval is observable while the generation interval usually is not. We 
define infectious contact from i to j to be a contact that is sufficient to infect j 
if i is infectious and j is susceptible, and wc define a potential infector of person 
i to be an infectious person who has positive probability of making infectious 
contact with i. Finally, we use the term hazard rather than force of infection to 
highlight the similarities between epidemic data analysis and survival analysis. 

The generation interval has been an important input for epidemic models 
used to investigate the transmission and control of SARS [3,4] and pandemic 
influenza [5,6]. More recently, generation interval distributions have been used 
to calculate the incubation period distribution of SARS [7] and to estimate Rq 
from the exponential growth rate at the beginning of an epidemic [8]. It is 
generally assumed that the generation interval distribution is characteristic of 
an infectious disease. In this paper, we show that this is not true. Instead, 
the expected generation interval decreases as the number of potential infectors 
of susceptibles increases. During an epidemic, generation intervals tend to 
contract as the prevalence of infection increases. This effect was described by 
Svensson [2] for an SIR model with homogeneous mixing. In this paper, we 
extend this result to all time-homogeneous stochastic SIR models. 

A simple thought experiment illustrates the intuition behind our main result. 
Imagine a susceptible person j in a room. Place m other persons in the room 
and infect them all at time t = 0. For simplicity, assume that infectious contact 
from i to j occurs with probability one, i = 1, ...,to. Let be a continuous 
nonnegative random variable denoting the first time at which i makes infectious 
contact with j. Person j is infected at time tj = min(iij, trnj)- Since 
all infectious persons were infected at time zero, tj is the generation interval. 
If we repeat the experiment with larger and larger m, the expected value of 
min(tij, ...,tmj) will decrease. 

When a susceptible person is at risk of infectious contact from multiple 
sources, there is a "race" to infect him or her in which only the first infectious 
contact leads to infection. Generation interval contraction is an example of a 
well-known phenomenon in epidemiology: The expected time to an outcome, 
given that the outcome occurs, decreases in the presence of competing risks. In 
our thought experiment, the outcome is the infection of j by a given i and the 
competing risks are infectious contacts from all sources other than i. 

Adapting our thought experiment slightly, we see that the contraction of the 
generation interval is a consequence of the fact that the hazard of infection for j 
increases as the number of potential infectors increases. Let X{t) be the hazard 
of infectious contact from any potential infector to j at time t and let i?[ij|m] 
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be the expected infection time of j given m potential infcctors. Then 



so the expected generation interval decreases as the number of potential infec- 
tors increases. A hazard of infection that increases with the number of potential 
infectors is a defining feature of most epidemic models, so generation interval 
contraction is a very general phenomenon. We note that a very similar phe- 
nomenon occurs in endemic diseases, where increased force of infection results 
in a decreased average age at first infection [9] . 

The rest of the paper is organized as follows: In Section 2, we describe a 
general stochastic SIR epidemic model. In Section 3, we use this model to 
show that the mean generation interval decreases as the number of potential 
infectors increases. As a corollary, we find that the mean serial interval also 
decreases. In Section 4, we consider the role of the population contact structure 
in generation interval contraction and illustrate the effects of global and local 
competition among potential infectors with simulations. In Section 5, we argue 
that hazards of infectious contact should be used instead of generation or serial 
interval distributions in the analysis of epidemic data. Section 6 summarizes 
our main results and conclusions. 

2 General stochastic SIR model 

We start with a very general stochastic "Susceptible-Infectious- Removed" (SIR) 
epidemic model. This model includes fully-mixed and network-based models as 
special cases, and it has been used previously to define a mapping from the final 
outcomes of stochastic SIR models to the components of semi-directed random 
networks [10, 11]. 

Each person i is infected at his or her infection time ti, with ti = oo i 
is never infected. Person i recovers from infectiousness or dies at time ti -f r^, 
where the recovery period r^ is a positive random variable with the cumulative 
distribution function (cdf) F.i{r). The recovery period may be the sum of 
a latent period, during which i is infected but not infectious, and an infectious 
period, during which i can transmit infection. We assume that all infected 
persons have a finite recovery period. If person i is never infected, let = oo. 
Let Sus(t) — {i : ti > t} be the set of susceptibles at time t. 

When person i is infected, he or she makes infectious contact with person j 
after an infectious contact interval nj . Each nj is a positive random variable 
with cdf Fij (r I ) and survival function S'^- (r | ) = 1 — (r | ) . Let Ty = oo if 
person i never makes infectious contact with person j, so the infectious contact 
interval distribution may have probability mass at oo. Define 




Sij{oo\ri) = lim Sij{T\ri), 
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which is the conditional probabihty that i never makes infectious contact with 
J given Tj. Since a person cannot transmit disease before being infected or 
after recovering from infectiousness, (r|ri) = 1 for all r < and Sij{T\ri) = 
Sij{oo\ri) for all r > r^. Since a person cannot infect himself (or herself), 
Tii = oo with probability one and S'ii(r|ri) = 1 for all r. 

The infectious contact time tij = + is the time at which person i makes 
infectious contact with person j. If person j is susceptible at time t^j, then 
i infects j and tj = tij. If tij < oo, then tj < tij because person j avoids 
infection at time tij only if he or she has already been infected. If person i 
never makes infectious contact with person j, then ttj ~ oo because Tij ~ oo. 
Figure [T] shows a schematic diagram of the relationships among r^, r^-, and tij. 

The importation time t^i of person i is the earliest time at which he or she 
receives infectious contact from outside the population. The importation time 
vector to = (toi, ■■■,ton)- 

We assume that each infected person has a unique infector. Following [4], 
we let Vi represent the index of the person who infected person i, with Vi = Q 
for imported infections and = cxo if i is never infected. If tied infectious 
contact times have nonzero probability, then Vi can be chosen from all j such 
that tji = ti < oo. 

2.1 Epidemics 

Let < t(2) < ... < i(m) be the order statistics of alHi, t„ less than infinity, 
and let (fc) be the index of the fc'^ person infected. Before the epidemic begins, 
an importation time vector to is chosen. The epidemic begins at time = 
mini(toi)- Person (1) is assigned a recovery time r(i). Every person j sSus(<(i)) 
is assigned an infectious contact time ~ + T(^i)j- The second infection 
occurs at <(2) ~ ™i^jeSus(t(i)) niiii(^Oii which is the first infectious contact 

time after . Person (2) is assigned a infectious period r(2) • After k infections, 
the next infection occurs at = minj£Sus(t(fc)) inin(ioj, ■■■,t(k)j)- The 

epidemic stops after to infections if and only if t(„i-|-i) = oo. 

3 Generation interval contraction 

In this section, we show that the mean infectious contact interval Tij given that 
i infects j is shorter than the mean infectious contact interval given that i makes 
infectious contact with j. In the notation from the previous section, 

E[Tij\vj ^i]< E[nj\T,j < oo] 

(note that Vj = i implies Tij < oo but not vice versa). In general, this inequality 
is strict when j is at risk of infectious contact from any source other than i. This 
inequality implies the contraction of generation and serial intervals during an 
epidemic. For background on the probability theory used in this section, please 
see Ref. [12] or any other probability text. 
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Lemma 1 E[Tij\vj = i] < E[Tij\Tij < oo]. 

Proof. Wc first show that E[Tij \ri, T^j < oo] < E[Tij\ri,Vj = i] and then use the 
law of iterated expectation. If person i was infected at time U and has recovery 
period r^, then the probabihty that Ty < oo is Fij{oo\ri) = 1 — Sij{oo\ri). Let 



be the conditional cdf of given and Tij < oo. Then 



If person j is susceptible at time ti and < oo, then Vj = i if and only if 
j escapes infectious contact from all other infectious people during the time 
interval {ti , ti + Tij ) . Let S^,j {ti + r) be the probability that j escapes infectious 
contact from all sources other than i in the interval {ti, ti + t). Given rj and 
Tij < OO, the conditional probability density for an infectious contact from i to 
j at time ti + t that leads to the infection of j is proportional to 




(1) 



S,j{t, + T)dF:jT\n). 



If we let 




then 




Since S^j {ti + ) is a monotonically decreasing function of Ti 



E[Tij\ri,Vj = i] - E[Tij\r,,Tij < oo] = E[t, 




\r,,Tij < oo] 




Cov{Ti 




\ri,T,j < oo) 



< 0. 



Therefore, 




(2) 



Since the same inequality holds for all r. 




< E[E[Tij\ri,Tij < oo]] = E[Tij\Tij < oo] 



(3) 



by the law of iterated expectation. ■ 

Equality holds in equation ([2]) if and only if Tij and 5* {ti + Tij ) have covari- 
ance zero given r, and Tij < oo. Since S,,j(ti + Tij) is a monotonically decreasing 
function of , this will occur if and only if or S^j {ti + ) is constant given 
Ti and Tij < oo. Equality holds in equation ^ if and only if equality holds in 
^ with probability one in r^. If Tij is constant, then clearly S^jiU + Tij) is con- 
stant and their covariance is zero. If j is not at risk of infectious contact from 
any source other than i, then S^:j{ti will be constant even when Tij is not. 

In the thought experiment from the Introduction, the expected infection time 
of the susceptible j would remain constant in the following two scenarios: (i) all 
infectious persons make infectious contact with j at a fixed time to, or (ii) j is 
only at risk of infectious contact from a single person. Scenario (i) corresponds 
to a constant Ty and scenario (ii) corresponds to a constant S** j (ti + Tij ) . 

The expected generation interval from i to j given Vj = i will be shortest 
when the risk of infectious contact to j from sources other than i is greatest. 
More specifically, 



will be minimized when S^^jiti+Tij) decreases fastest in Tij. In general, the risk 
of infectious contact from other sources will be greatest when the prevalence of 
infection is highest, so we expect the greatest contraction of the serial interval 
during an epidemic to coincide with the peak prevalence of infection. 

In general, wc expect to sec the following pattern over the course of an 
epidemic: The mean generation interval decreases as the prevalence of infection 
increases, reaches a minimum as the prevalence of infection peaks, and increases 
again as the prevalence of infection decreases. 

3.1 Types of generation intervals 

In [2], Svennson discussed two types of generation intervals that are consistent 
with the verbal definition given in the Introduction. Tp (jj for "primary") 
denotes Tij where i is chosen at random from all persons who infect at least one 
other person and j is chosen randomly from the set of persons i infects. 
(s for "secondary") denotes Tij where j is chosen at random from all persons 
infected from within the population and i = Vj. Tp and Tg differ only in the 
sampling procedure used to obtain the ordered pair ij] Tp samples primary cases 
(infectors) at random while Ts samples secondary cases at random. Equation 
([3]) implies that both E[Tp] and E[Ts] decrease when susceptible persons are at 
risk of infectious contact from multiple sources. This contraction occurs because 
the definitions of Tp and Ts include only Tij such that i actually infected j. 

3.2 Serial interval contraction 

In an epidemic, infection times are generally unobserved. Instead, symptom 
onset times are observed. Recall that the time between the onset of symptoms in 
an infected person and the onset of symptoms in his or her infector is called the 
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serial interval. Contraction of the mean generation interval implies contraction 
of the mean serial interval as well. The incubation period is the time from 
infection to the onset of symptoms [1]. Let qi be the incubation period in 
person i, and let t^"^ = ti + qi be the time of his or her onset of symptoms. If 
Vj = i, then the serial interval associated with person j is 



,sym _ ,sym - 7-. . i _ 



Therefore, 



E[fy^ - = ^] = E\t,,\v, = ^] + E[q,] E[q,] 

< E[nj\T,j <oo]+ E[qj] - E[qi], 

with strict inequality whenever strict inequality holds for the corresponding 
generation interval. Over the course of an epidemic, we expect the mean serial 
interval to follow a pattern very similar to that of the mean generation interval. 



4 Simulations 

We refer to the "race" to infect a susceptible person as competition among po- 
tential infectors. In this section, we illustrate two types of competition among 
potential infectors: Global competition among potential infectors results from 
a high global prevalence of infection. Local competition among potential in- 
fectors results from rapid transmission within clusters of contacts, which causes 
susceptibles to be at risk of infectious contact from multiple sources within their 
clusters even if the global prevalence of infection is low. In real epidemics, the 
prevalence of infection is usually low but there is clustering of contacts within 
households, hospital wards, schools, and other settings. 

In this section, we use simulations to illustrate generation interval contrac- 
tion under global and local competition among potential infectors. Each simu- 
lation is a single realization of a stochastic SIR model in a population of 10, 000. 
We keep track of the infection times of the primary and secondary case in each 
infector/infectee pair and the prevalence of infection at the infection time of the 
secondary case, which is a proxy for the amount of competition to infect the 
secondary case. We then calculate a smoothed mean of the generation interval 
as a function of the infection time of the primary case in each pair. Another 
valid approach would be to calculate the smoothed means from the results of 
many simulations. We did not take this approach for the following reasons: 
(i) Because of variation in the time course of different realizations of the same 
stochastic SIR model, many simulations would be required to obtain a curve 
that reliably approximates the asymptotic limit, (ii) The smoothed mean over 
many simulations would show a pattern similar to that obtained in any single 
simulation, (iii) Generation interval contraction was proven in Section 3, so 
the simulations are intended primarily as illustrations. 

All simulations were implemented in Mathematica 5.0.0.0 [©1988-2003 Wol- 
fram Research, Inc.]. All data analysis was done using Intercooled Stata 9.2 
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[© 1985-2007 StataCorp LP] All smoothed means are running means with a 
bandwidth of 0.8 (the default for the Stata command lowess with the option 
mean). Similar results were obtained for larger and smaller bandwidths. 



4.1 Global competition 

To illustrate global competition among potential infectors, we use a fully-mixed 
model with population size n = 10, 000 and basic reproductive number Rq. 
The infectious period is fixed, with = 1 with probability one for all i. The 
infectious contact intervals Ty have an exponential distribution with hazard 
Ro{n - l)-i truncated at r^, so S,j{T\ri) = e-^o{n-i)-^r ^^len < r < 1 
and Tij = oo with probability e--f''o("-i) . The epidemic starts with a single 
imported infection and no other imported infections occur. 

From equation ([1]), the mean infectious contact interval given that contact 
occurs is 

rl g-fioT(ri-l)-i _ g-i?o(n-l)"' 

E[t,,\t,, < oo] = i_e-flo(»-i)-^ 

For n = 10,000, Table [1] shows this expected value at each Rq. For all Rq, 
E[Tij\Tij < oo] « .5. 

This model was run once at Rq = 1.25, 1.5, 2, 3, 4, 5, and 10. For each 
simulation, we recorded U, Vi, t^,. , and the prevalence of infection at time ti 
in each infector/infectee pair. Figure [2] shows smoothed mean curves for the 
generation interval versus the source infection time for Rq ~ 2,3,4,5. There 
is a clear tendency for the mean generation interval to contract, with greater 
contraction at higher Rq. Figure [3] shows smoothed mean curves for the gen- 
eration interval and the prevalence of infection versus the source infection time 
at each Rq; in each case, the greatest contraction of the serial interval coincides 
with the peak prevalence of infection (i.e., the greatest competition among po- 
tential infectors). Figure [H shows the same curves for Rq = 1.25 and 1.50; in 
these cases, the generation interval stays relatively constant. These results are 
exactly in line with the argument of Section 3. 



4.2 Local competition 

To illustrate local competition among potential infectors, we grouped a popu- 
lation of n ~ 9, 000 individuals into clusters of size k. As before, the infectious 
period is fixed at = 1 for all i. When i and j are in the same cluster, the in- 
fectious contact interval t^- has an exponential distribution with hazard Awithin 
truncated at r^, so Sij{T\ri) = e"'*'""*'-'^ when < r < 1 and Ty = oo with 
probability e"^™""'-. When i and j are in different clusters, has an exponen- 
tial distribution with hazard Abctwcon truncated at r^, so Sij{T\ri) = g-^bctwoonT 
when < r < 1 and = oo with probability e~'^'"='"'=™ . 

We fixed the hazard of infectious contact between individuals in the same 
cluster at Awithin — -4. We tuned the hazard of infectious contact between indi- 
viduals in different clusters to obtain R mean infectious contacts by infectious 
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individuals; specifically, 



i?-(fc-l)(l-e--4) 



n — k 



We chose Awithin = -4 to obtain rapid transmission within clusters while retain- 
ing sufficient transmission between clusters to sustain an epidemic. Note that 
when k > — e^ '*)^^ + 1, we get the implausible result that Abctwccn < 0. 
Clearly, R and k must be chosen so that an infectious person makes an average 
of R or fewer infectious contacts within his or her cluster, which guarantees that 

Abctwccn 1^ 0- 

At a given i?, the mean infectious contact interval given that infectious 
contact occurs depends on the cluster size. If the entire population is infectious 
and the cluster size is fc, then a given individual will receive an average of R 
infectious contacts, of which (fc — 1)(1 — e~'*) come from within his or her cluster. 
The mean infectious contact interval for within-cluster contacts is 



and the mean infectious contact interval for between-cluster contacts is approx- 
imately .5 (as in the models for global competition). Therefore, the mean 
infectious contact interval given that contact occurs and the cluster size is k is 



To compare generation interval contraction for different cluster sizes, we calcu- 
lated scaled generation intervals by dividing the observed generation intervals 
at each cluster size by E[Tij\Tij < oo,k]. If the mean generation interval re- 
mained constant, we would expect the mean scaled generation interval to be 
approximately one throughout an epidemic. 

For i? = 2, we ran the model with cluster sizes of 1 through 6. For i? = 3, we 
ran the model with cluster sizes of 2 through 8. For each simulation, we recorded 
ti, Vi, ty. , and the prevalence of infection at time ti in each infector/infectee pair. 
Figure [5] shows smoothed mean curves for the generation interval and prevalence 
versus the source infection time for several cluster sizes at each R. As before, 
there is a clear tendency of the mean generation interval to contract. The degree 
of contraction is roughly the same for all cluster sizes, but this contraction is 
maintained at a lower global prevalence of infection in models with larger cluster 
sizes. Similar results were obtained for cluster sizes not shown. Again, these 
results are exactly in line with the argument of Section 3. 

5 Consequences for estimation 

The effect of generation interval contraction on parameter estimates obtained 
from models that assume a constant generation or serial interval distribution is 
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difficult to assess. The assumption of a constant serial or generation interval 
distribution may be reasonable in the early stages of an epidemic with little 
clustering of contacts, in an epidemic with Rq near one, or in an endemic situa- 
tion. However, this ignores the more fundamental issue that estimates of these 
distributions are obtained from transmission events where the infector/infectee 
pairs are known (often because of transmission from a known patient within 
a household or hospital ward). Even in the early stages of an epidemic, the 
generation interval distribution in these settings may differ substantially from 
the generation interval distribution for transmission in the general population. 

In this section, we argue that hazards of infectious contact can be used 
instead of generation or serial intervals in the analysis of epidemic data. As an 
example, we look at the estimator of R(t) (the effective reproductive number at 
time t) derived by Wallinga and Tenuis [4] and applied to data on the SARS 
outbreaks in Hong Kong, Vietnam, Singapore, and Canada in 2003. In their 
paper, the available data was the "epidemic curve" t = (f(i), i(m)), where 

is the infection time of the i*'^ person infected. They assume a probability 
density function (pdf) w{t\9) for the serial interval given a vector 9 of parameters 
(note that this parameter vector applies to the population, not to individuals). 
The infector of person (i) is denoted by with = for imported infections. 
The "infection network" is a vector v = (w(i), ...,i'(m)) specifying the source of 
infection for each infected person. With these assumptions, the likelihood of v 
and 9 given t is 

L(v,0|t)= l[ wm,)-t„J9). 

The sum of this likelihood over the set V of all infection networks consistent 
with the epidemic curve t is 

Lm= n J2wit^-m- 

Taking a likelihood ratio, Wallinga and Tennis argue that the relative likelihood 
that person k was infected by person j is 

(WT) ^ W{tk~tj\9) 

j:..^,w{t,~U\9)- ^' 

The number Rj of secondary infectious generated by person j is a sum of 
Bernoulli random variables with expectation 



fc=l 



An estimate of the effective reproductive number R{t) can be obtained by cal- 
culating a smoothed mean for a scatterplot of {tj, E[Rj]). This analysis is 
ingenious, but it can be only approximately correct because the distribution of 
serial intervals varies systematically over the course of an epidemic. 



9 



5.1 Hazard-based estimator 



A very similar result can be derived by applying the theory of order statistics 
(see Ref. [12]) to the general stochastic SIR model from Section 2. Specifically, 
we use the following results: If Xi, X^ are independent non- negative random 
variables, then their minimum -'^'(i) has the hazard function 

n 

2=1 

Given that the minimum is the probability that Xj ~ X(^ij (i.e. that the 
minimum was observed in the j*'' random variable) is 

^j(a^(i)) 

Er=i^i(a;(i))' 

For simplicity, we assume that the infectious contact intervals are absolutely 
continuous random variables. 

Let Xij{T\ri) be the conditional hazard function for given and let Xoi{t) 
be the hazard function for infectious contact to i from outside the population 
at time t. Since is nonncgative, Ajj(r|ri) = whenever r < 0. Let H(t) 
denote the set of infection times and recovery periods for all i such that ti < t. 
If person k is susceptible at time his or her total hazard of infection at time 
t given H{t) is YJi=[)^ik{t - UVi), where we let \ok{t - toko) = Aofc(t) for 
simplicity of notation. If an infection occurs in person k at time tfc < oo, then 
the conditional probability that person j infected person k given H{tk) is 

^ \,k{tk~t,\r,) 

which is the probability that tj^ = miii(tofei ^ifci •■•j t?ifc)- This has the same 
form as equation ^ except that it uses hazards of infectious contact instead 
of a pdf for the serial interval. If the hazards of infectious contact in the 
underlying SIR model do not change over the course of an epidemic, then pj^ 
can be estimated accurately throughout an epidemic. Unlike the assumption of 
a stable generation or serial interval distribution, this assumption is unaffected 
by competition among potential infectors. The rest of the estimation of R{t) 
could proceed exactly as in Ref. [4], replacing p^^'^^ with pjk- 

5.2 Partial likelihood for epidemic data 

A partial likelihood for epidemic data can be derived using the same logic as 
that used to derive pjk in equation ([5]). For each person k such that < oo, 
the probability that the failure at time tk occurred in person k given H{tk) is 



EJ=iEr=o K3{tk~ti\n 
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where the numerator is the hazard of infection (from all sources) in person k at 
time tk and the denominator is the total hazard of infection for all persons at 
risk of infection at time tk- 

If there is a vector of parameters x^j for each pair ij (which may include 
individual-level covariates for i and j as well as pairwise covariates for the or- 
dered pair ij) and a vector of parameters such that Xij{T\ri) — A(r|ri, , 6'), 
then a partial likelihood for can be obtained by multiplying equation ([6]) over 
all m observed failure times. If (fc) denotes the index of the k^^ person infected, 
t = {ti, ...jtn), and X = {x.y : i,j ^ 1, then the partial likelihood is 

LAO\t,X) = f[^^^^^i^^ (7) 

This is very similar to partial likelihoods that arise in survival analysis, so many 
techniques from survival analysis may be adaptable for use in the analysis of 
epidemic data. 

The goal of such methods would be to allow statistical inference about the 
effects of individual and pairwise covariates on the hazard of infection in ordered 
pairs of individuals. In the ordered pair ij, the effects of individual covariates for 
i and j on Xij (t \ r.i ) would reflect the infectiousness of i and the susceptibility of 
J, respectively. Pairwise covariates could include such information as whether i 
and j are in the same household, the distance between their households, whether 
they are sexual partners, and any other aspects of their relationship to each other 
thay may affect the hazard of infection from i to j. 

This approach has several advantages over any approach based on a distri- 
bution of generation or serial intervals. First, it is not necessary to determine 
who infected whom in any subset of observed infections. If Vj is known for some 
j, this knowledge can be incorporated in the partial likelihood by replacing the 
term for the failure time of person j in ([7]) with py.j from equation ([5|). Second, 
this approach allows the use individual-level and pairwise covariates for infer- 
ence in a flexible and intuitive way. The resulting estimated hazard functions 
have a straightforward interpretation and can be incorporated naturally into a 
stochastic SIR model. Third, this approach allows theory and methods from 
survival analysis to be applied to the analysis of epidemic data. 

6 Discussion 

Generation and serial interval distributions are not stable characteristics of an 
infectious disease. When multiple infectious persons compete to infect a given 
susceptible person, infection is caused by the first person to make infectious 
contact. In Section 3, we showed that the mean infectious contact interval Tij 
given that i actually infected j is less than or equal to the mean r^j given i made 
infectious contact with j. That is, 

^[■''ul'^'i ^ *] — E[Tij\Tij < oo], 
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with strict inequality when is non-constant and j is at risk of infectious 
contact from any source other than i (more precise conditions are given in 
Section 3). This result holds for all time- homogeneous stochastic SIR models. 

In an epidemic, the mean generation (and serial) intervals contract as the 
prevalence of infection increases and susceptible persons are at risk of infectious 
contact from multiple sources. In the simulations of Section 4, we saw that the 
degree of contraction increases with Rg. For models with clustering of contacts, 
generation interval contraction can occur even when the global prevalence of in- 
fection is low because susceptibles are at risk of infectious contact from multiple 
sources within their own clusters. In all of the simulations, the greatest serial 
interval contraction coincided with the peak prevalence of infection, when the 
risk of infectious contacts from multiple sources was highest. The mean gener- 
ation interval increases again as the epidemic wanes, but this rebound may be 
small when Rq is high. 

The reason that generation and serial intervals contract during an epidemic 
is that their definition applies to pairs of individuals ij such that i actually 
transmitted infection to j. If we don't require that an infectious contact leads 
to the transmission of infection, we are led naturally to the concept of the 
infectious contact interval, which has a well-defined distribution throughout an 
epidemic. Similarly, we can define Rq as the mean number of infectious contacts 
(i.e., finite infectious contact intervals) made by a primary case without reference 
to a completely susceptible population. Generation and serial intervals and the 
effective reproductive number can then be defined in terms of infectious contacts 
that actually lead to the transmission of infection. Many fundamental concepts 
in infectious disease epidemiology can be simplified usefully by defining them in 
terms of infectious contact rather than infection transmission. 

Infectious contact hazards for ordered pairs of individuals can be used for 
many of the same types of analysis that have been attempted using generation 
or serial interval distributions. In Section 5, We derived a hazard-based esti- 
mator of R{t) very similar to that developed by Wallinga and Tenuis [4]. This 
derivation led naturally to a partial likelihood for epidemic data very similar to 
those that arise in survival analysis. We believe that the adaptation of methods 
and theory from survival analysis to infectious disease epidemiology will yield 
flexible and powerful tools for epidemic data analysis. 
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Ro 


E[r,,,|Ty < oo] 


.5-E[r,.,|T,j < oo] 


1.25 


.49999 


.00001 


1.5 


.499988 


.000012 


2 


.499983 


.000017 


3 


.499975 


.000025 


4 


.499967 


.000033 


5 


.499958 


.000042 



Table 1: Expected infectious contact interval given that infectious contact occurs 
in the models illustrating global competition among potential infectors. If the 
generation interval were constant, this would be the mean generation interval 
throughout an epidemic. 
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ti + r. 



Figure 1: Schematic diagram of variables in the general stochastic SIR model 
for the ordered pair ij. Recall that tj < tij. As discussed in Section 3.2, person 
i develops symptoms at time t^"^ = ti + qi, where qi is the incubation period. 



14 



Smoothed mean generation inteival 




— 1 1 1 1 1 r— 

2 4 6 8 10 

Source infection time 



RO 


= 2 


RO 


= 3 


RO 


= 4 


RO 


= 5 



Figure 2: The smoothed mean generation interval as a function the source 
infection time for Rq = 2,3,4,5. There is a clear tendency to contract, with 
greater contraction for higher Rq. 
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Figure 3: The smoothed mean generation interval (sohd hncs) and prevalence 
(dotted lines) as a function of the source infection time for Rq = 2,3,4,5. In 
all cases, the greatest contraction of the serial interval coincides with the peak 
prevalence of infection (i.e., the greatest competition among potential infectors). 
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Figure 4: The smoothed mean generation intervals (sohd hnes) and prevalence 
(dotted lines) as a function of the source infection time for Rq = 1.25 and 1.50. 
For Rq near one, the mean generation interval stays relatively constant. 
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Figure 5: The smoothed mean scaled generation interval (SGI) and prevalence 
as a function of the source infection time for i? = 2 and i? = 3. With increasing 
cluster size, the degree of generation interval contraction is roughly the same 
even though the peak prevalence of infection is lower. 
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